Prediction of hematoma expansion in spontaneous intracerebral hemorrhage using a multimodal neural network

Hematoma expansion occasionally occurs in patients with intracerebral hemorrhage (ICH), associating with poor outcome. Multimodal neural networks incorporating convolutional neural network (CNN) analysis of images and neural network analysis of tabular data are known to show promising results in prediction and classification tasks. We aimed to develop a reliable multimodal neural network model that comprehensively analyzes CT images and clinical variables to predict hematoma expansion. We retrospectively enrolled ICH patients at four hospitals between 2017 and 2021, assigning patients from three hospitals to the training and validation dataset and patients from one hospital to the test dataset. Admission CT images and clinical variables were collected. CT findings were evaluated by experts. Three types of models were developed and trained: (1) a CNN model analyzing CT images, (2) a multimodal CNN model analyzing CT images and clinical variables, and (3) a non-CNN model analyzing CT findings and clinical variables with machine learning. The models were evaluated on the test dataset, focusing first on sensitivity and second on area under the receiver operating curve (AUC). Two hundred seventy-three patients (median age, 71 years [59–79]; 159 men) in the training and validation dataset and 106 patients (median age, 70 years [62–82]; 63 men) in the test dataset were included. Sensitivity and AUC of a CNN model were 1.000 (95% confidence interval [CI] 0.768–1.000) and 0.755 (95% CI 0.704–0.807); those of a multimodal CNN model were 1.000 (95% CI 0.768–1.000) and 0.799 (95% CI 0.749–0.849); and those of a non-CNN model were 0.857 (95% CI 0.572–0.982) and 0.733 (95% CI 0.625–0.840). We developed a multimodal neural network model incorporating CNN analysis of CT images and neural network analysis of clinical variables to predict hematoma expansion in ICH. The model was externally validated and showed the best performance of all the models.


Training and validation dataset
Excluded, n=1063 • CT slice thickness > 2mm, n=9 • Baseline CT at > 24 hours of onset, n=4 • Unknown onset time, n=112 • No follow-up CT or follow-up CT at > 30 hours after baseline CT, n=23 • Secondary cause of ICH, n=6 • Surgical treatment before follow-up CT, n=41 • Insufficient data, n=11 Screened ICH patients for eligibility, n=312 Included for the study, n=106

Image acquisition and segmentation
CT scans were performed in the supine position at 120 kVp with a thickness of 0.5-2.0mm and an image shape of 512 × 512 or greater; the images were exported as original images in the Digital Imaging and Communications in Medicine (DICOM) format.For baseline and follow-up CT scans, intraparenchymal hematomas were manually segmented by two raters, board-certified stroke specialists with more than 15 years of experience, using 3D Slicer, with hematoma volumes calculated by planimetry (Fig. 2).Intraventricular hematomas were neither segmented nor included in the hematoma volume.Hematoma expansion was defined as a volume increase between baseline and follow-up CT scans greater than 6 cm 3 or 33% of baseline volume [11][12][13]15,16 . Accoring to this definition, all the patients were labeled as having hematoma expansion or no hematoma expansion.On the baseline CT, segmented areas were marked with a value of 1 and other areas were marked with 0; the images were exported as masked images in DICOM format for the CNN analysis (Fig. 2).

Evaluation of CT findings at baseline
As CT findings in predicting hematoma expansion, blend sign, intrahematoma hypodensities, and irregular shape were evaluated.Blend sign was defined as the blending of a relatively hypoattenuating area with an adjacent hypoattenuating area within the hematoma 6,7,9 .Intrahematoma hypodensities were defined as the presence of any hypodense region encapsulated within the hematoma and separated from the surrounding parenchyma 7,10 .Irregular hematoma shape was defined as having 2 or more edge irregularities 4,5,7 .These were assessed independently by two raters; prior to assessment, the raters were trained using at least 10 patients with ICH that were not included in this study.In case of disagreement, the findings were reassessed by both raters together until a consensus was reached.Hematoma location and intraventricular hematoma extension were also evaluated.CNN Model 1 used unmodified original CT images as input.CNN Model 2 used only intraparenchymal hematoma images, which were generated from original and masked images (Fig. 3).

Designing models for prediction
Multimodal CNN Model 1 used all available clinical variables as one input.For Multimodal CNN Model 2, we first analyzed all clinical variables with univariate analyses between expansion and no expansion cases in the training and validation dataset, and used the clinical variables associated with hematoma expansion with statistical significance.For both models, the original images or the intraparenchymal hematoma images, whichever performed better in the comparison between CNN Model 1 and 2, were used as the other input.
Non-CNN Model 1 and 2 employed k-nearest neighbors as algorithm, while Non-CNN Model 3 and 4 employed fully connected neural network.Non-CNN Model 1 and 3 used all available CT findings and clinical variables as input, whereas non-CNN Model 2 and 4 used CT findings and clinical variables significantly associated with hematoma expansion in univariate analyses between expansion and no expansion cases in the training and validation dataset.

Preprocessing
All processing was done with Keras (version 2.12.0), a deep learning application programming interface in Python, running with 40 GB of GPU memory.All of the code in this study is available on GitHub (https:// github.com/ AI-neuro surg/ Multi modal-netwo rk-for-predi cting-hemat oma-expan sion-in-ICH).
For clinical variables, standardization was first performed for continuous variables in the training and validation dataset.Standardization of the test dataset was then performed based on the mean and standard deviation in the training and validation dataset.
Prior to image processing, all DICOM files were converted to Neuroimaging Informatics Technology Initiative (NIfTI) files.Preprocessing for CNN was performed separately for original and masked images.For the original images, the following steps were executed: (1) density scaling, (2) reslicing, (3) pixel size unification, and (4) resizing.First, after extracting the brain and hematoma by thresholding the Hounsfield units between 0 and 100, the pixel values were scaled between 0 and 1 by dividing the values by 100.Second, the images were resliced with a new slice thickness of 2 mm, and the new number of slices was set to 80, with all slices at or beyond the 81st position from the most cranial slice being deleted.Third, the axial pixel sizes were unified to 0.5 × 0.5 mm, because the image magnification varied between CT scans.Since all image shapes became slightly smaller than 512 × 512, padding was performed to keep the image shape at 512 × 512.Fourth, resizing was done by changing the image shape from 512 × 512 to 256 × 256 to fit the GPU memory.The preprocessed original images were used as input to CNN Model 1.For the masked images, the above steps were executed from the second to the fourth, since the masked images were already binary, either 0 or 1. Intraparenchymal hematoma images were generated from preprocessed original and masked images for CNN Model 2 (Fig. 3).
From the training and validation dataset, 70% were randomly assigned to the training set and the rest to the validation set.To balance the ratio of expansion cases to no expansion cases, data augmentation and random oversampling were applied only to expansion cases in the training set.Data augmentation was conducted in www.nature.com/scientificreports/CNN and multimodal CNN models, where images were flipped and rotated 30 degrees.Random oversampling was performed in non-CNN models.

Model architecture
CNN Model 1 and 2 were composed of four 3-dimensional convolutional layer blocks with batch normalization, ReLU activation function, and max pooling, followed by a dense layer block with global average pooling, ReLU activation function, and dropout (Fig. 4a).At the end, a final dense layer with sigmoid activation function was placed.The kernel sizes in the convolutional layers were 19 × 19 × 7, 19 × 19 × 7, 14 × 14 × 5, and 11 × 11 × 4 consecutively.Multimodal CNN Model 1 and 2 consisted of two parts: an image part and a clinical-variables part (Fig. 4b).The image part had the same architecture as the CNN Models except for the final block.The clinicalvariables part was composed of two dense layer blocks with batch normalization, ReLU activation function, and dropout.The image and clinical-variables parts were concatenated in the middle, followed by a dense layer block with batch normalization, ReLU activation function and dropout and a final dense layer with sigmoid activation function.
In non-CNN Model 1 and 2, k-nearest neighbors from scikit-learn machine learning library (version 1.2.2) was adopted.A hyperparameter of the number of neighbors were chosen from 3, 5, 7, 9, and 11.Non-CNN Model 3 and 4 were composed of three dense layer blocks with batch normalization, ReLU activation function, and dropout, followed by a final dense layer block with sigmoid activation function.
The labels to predict in the models were hematoma expansion or no hematoma expansion.No missing data were treated in the models because all the required data, including images and clinical variables, were complete for all patients.

Training, validation and test
For CNN Models, Multimodal CNN Models, and Non-CNN Model 3 and 4, 70 epochs of training were performed with a batch size of 2, where binary cross-entropy and Adam were used for the loss function and optimizer, respectively 27 .For Adam, the following settings were used: learning rate = 0.001, beta 1 = 0.9, beta 2 = 0.999, and epsilon = 1e-07.The cut-off value of 0.5 was used to binarize the cases.At each epoch, sensitivity and area under the receiver operating curve (AUC) were calculated with the validation dataset to monitor the training process.Five trained model weights from all epochs that had better sensitivity and AUC in validation were selected and used for testing, where the final test result was derived from the weights with the highest sensitivity.
For Non-CNN Model 1 and 2, the training and validation dataset were fitted to the k-nearest neighbors algorithm while changing the hyperparameter of the number of neighbors, to which the test dataset was applied and the final result was derived from the number of neighbors with the highest sensitivity.

Ethical approval
This study was approved by the following institutional review boards: Mie Chuo Medical Center institutional review board [permit number: MCERB-202321], Matsusaka Chuo General Hospital institutional review board

Results
After applying the inclusion and exclusion criteria, 273 patients were assigned to the training and validation dataset, while 106 patients were assigned to the test dataset.Patient characteristics of the study population are shown in Table 1.Their row data are stored in OSF in comma-separated values format (https:// osf.io/ jmnzs).
On CT findings, intrahematoma hypodensities, hematoma location, and hematoma volume were statistically significant in univariate analyses between expansion and no expansion cases in the training and validation dataset; these were used as input in Non-CNN Model 2 and 4. On clinical variables, anticoagulant use, systolic and diastolic blood pressure, PT-INR, and time from onset to baseline CT were significant and used as input in Multimodal CNN Model 2 and Non-CNN Model 2 and 4.
The performance of each model is shown in Table 2 and Fig. 5.In CNN Model 1 and 2, sensitivity was the same, but specificity and AUC were higher in CNN Model 2. Therefore, intraparenchymal hematoma images, www.nature.com/scientificreports/rather than original images, were used as input of Multimodal CNN Model 1 and 2 (Fig. 3).The number of neighbors of 7 achieved the highest sensitivity for Non-CNN Model 1 and 2. Sensitivity was higher for CNN Models and Multimodal CNN Models than for Non-CNN Models.In particular, CNN Model 1 and 2 and Multimodal CNN Model 2 achieved a sensitivity of 1.000 (95% confidence interval 0.768-1.000

Discussion
A multimodal neural network model incorporating CNN analysis of CT images and neural network analysis of clinical variables showed a sensitivity of 1.000 for predicting hematoma expansion in spontaneous ICH.The model outperformed CNN analysis of CT images alone and machine learning analysis of CT findings and clinical variables.It is a highly complete model that utilizes all available patient information.The multimodal model would be beneficial in clinical practice, as it effectively identifies patients who require thorough and intensive care after admission.For clinicians treating ICH, the greatest concern in predicting hematoma expansion is missing expansion cases because they may experience neurological deterioration and require careful observation in the intensive care units 2,3 .Not missing a single case at risk is critical in stroke care.Therefore, the goal of the prediction in this study was set to achieve higher sensitivity while balancing with AUC.In binary classification, the binary crossentropy is usually used as the loss function, and training is aimed at minimizing the loss function.However, a low value of binary cross-entropy loss does not always equate to high sensitivity.Thus, for testing, we did not simply select the model weights with the lowest loss value in validation, but selected those with better sensitivity and AUC.However, this selection might have enhanced the model performance.
The Multimodal CNN Model 2, which used CT images and selected clinical variables as input, showed the best performance.The superior performance of Multimodal CNN Model 2 compared to CNN Model 2 underscores the importance of clinical information in predicting hematoma expansion.Here, not all clinical variables were necessary, only 5 were used.The fact that the model works with fewer inputs is critical to its practical use, as it can sometimes be difficult to collect sufficient information in clinical settings.Furthermore, the superiority of multimodal neural network models over non-CNN models underscores that CNN analysis outperforms humanbased CT findings evaluation, even when combined with clinical information, in predicting hematoma expansion.
One of the most challenging aspects in the development of CNN and multimodal CNN models was the size of the kernels in the convolutional layers.Typically, a kernel size of 3 or at most 5 is used because a larger kernel size consumes more computational power 28 .However, in preliminary experiments, we observed a divergence in the training process with a kernel size of 3 or 5, even when adding layers or increasing the number of kernels.The voxel size of the CT images was 0.5 × 0.5 × 2.0 mm, where the kernel size of 3 or 5 may have been too small to extract features from the hematoma.The larger kernel sizes up to 19 × 19 × 7 worked effectively in this study; we could not confirm the kernel size in other studies that used CNN to predict hematoma expansion because the programming codes were not disclosed [18][19][20][21]29 .
Several considerations have been suggested for the soundness of research using artificial intelligence (AI) techniques, such as the use of an external test set for the final report, transparency of algorithms, etc. 30,31 .However, many clinical studies have not actually followed these basic considerations.In this study, the model was trained on data from several hospitals and tested on external data.Multiple models were created for comparison.Clinical information, algorithm programming code, and model weights were disclosed to make our results verifiable and the model reproducible.
To date, there are 2 studies that predicted hematoma expansion in ICH by analyzing both CT images and clinical information with CNN 29,32 .In one study, hematoma features extracted from CNN and radiomics and clinical variables were integrally analyzed with support vector machine 29 .It achieved sensitivity of 0.83 and AUC of 0.95; however, the testing method was not described in detail and patient data from a single hospital were used for both training and testing 29 .The other analyzed CNN-derived hematoma features and clinical variables using multivariate logistic regression, achieving sensitivity of 0.76 and AUC of 0.83 32 .This is also a single-center study, and the sensitivity is low to use the model in clinical practice.In our study, we achieved a sensitivity of 1.00, which is critical for clinical use in stroke management.
CNN Model 2 using intraparenchymal hematoma images outperformed CNN Model 1 using unmodified original CT images.Therefore, intraparenchymal hematoma images were used as input for multimodal models.However, their segmentations were performed manually by humans in this study because automated segmentation remains unsatisfactory in some cases with an inaccurate differentiation between intraparenchymal and intraventricular hematoma [33][34][35] .When more accurate segmentation of intraparenchymal hematoma becomes possible, and clinical variables can be automatically collected from the medical record, this prediction task could be fully automated.
Several limitations should be noted.First, although a perfect sensitivity of 1.000 was achieved in a multimodal neural network model while balancing AUC, the lower limit of the confidence interval was 0.768, indicating that more cases are needed for more reliable model validation.Second, although the external dataset from another hospital was used for testing, validation with various patient demographics is required to further ensure the reliability of the developed models.Third, the comparisons of the model performance were not supported by statistical significance testing; instead, simple comparisons were conducted among the models.More cases are also required to statistically demonstrate significant differences based on 95% confidence intervals.Fourth, the images were resized to 256 × 256 to fit the GPU memory.Analysis at the original 512 × 512 size may yield better results.Fifth, only k-nearest neighbors and fully connected neural networks were employed for non-CNN models.Other machine learning models may have performed better, but logistic regression, support vector machines, random forests, and gradient boosting were inferior to k-nearest neighbors in the previous report predicting hematoma expansion 14 .Sixth, this is a retrospective study.Validation with prospective data is required as a future step.Seventh, to apply the models to clinical practice, systems that comprehensively capture patient information, including images and clinical variables, are required.Last, although the clinical variables that are included in the study were generally collected from the patients in the clinical setting for stroke care, they are sometimes unavailable.A model capable of handling missing data may be beneficial.

Conclusion
We developed a multimodal neural network model incorporating CNN analysis of CT images and neural network analysis of clinical variables to predict hematoma expansion in acute spontaneous ICH.The model was externally validated and outperformed CNN analysis of CT images alone and machine learning analysis of CT findings and clinical variables.The multimodal model achieved sufficient performance with a sensitivity of 1.000 to potentially enable decision support in clinical settings; it effectively identifies patients who require thorough and intensive care after admission.The algorithm programming code and model weights are available for verification and public use.To ensure the reliability of the models, validation with prospective datasets for various patient demographics is necessary as a future step.Furthermore, a model capable of handling missing data or systems that comprehensively capture patient information would be required to enable widespread use of predictive models in clinical practice.

•
CT slice thickness > 2mm, n=207 • Baseline CT at > 24 hours of onset, n=48 • Unknown onset time, n=384 • No follow-up CT or follow-up CT at > 30 hours after baseline CT, n=155 • Secondary cause of ICH, n=64 • Surgical treatment before follow-up CT, n=146 • Insufficient data, n=59 Screened ICH patients for eligibility, n=1336Included for the study, n=273

Figure 2 .
Figure 2. Baseline (a) and follow-up (b) CT images of a case with hematoma expansion.Intraparenchymal hematomas were manually segmented (green areas), with hematoma volumes computed by planimetry.(a) For baseline images, segmented areas were marked as 1 and other areas were marked as 0, which were exported as masked images.

Figure 3 .
Figure 3. Intraparenchymal hematoma images generated from original and masked images.

Figure 4 .
Figure 4. (a) The CNN models were composed of four 3-dimensional convolutional layer blocks and one dense layer block, followed by a final dense layer block with sigmoid activation function.The kernel sizes of the convolutional layers were 19 × 19 × 7, 19 × 19 × 7, 14 × 14 × 5, and 11 × 11 × 4, respectively.(b) The multimodal CNN models consisted of an image part and a clinical-variables part.The architecture of the image part was the same as in the CNN models (a), except for the last block.The clinical-variables part was consisted of two dense layer blocks.These two parts were concatenated, followed by a dense layer block and a final dense layer block.

Figure 5 .
Figure 5. Receiver operating curves for each model in Table2, except for Non-CNN Model 1 and 2. These models were excluded because they do not return continuous values as a prediction. https://doi.org/10.1038/s41598-024-67365-3www.nature.com/scientificreports/ Three types of models were designed to predict hematoma expansion: (1) a CNN model, (2) a multimodal CNN model, and (3) a non-CNN model.A CNN model used CT images as input.A multimodal CNN model used CT images and clinical variables as input, combining CNN analysis of CT images and neural network analysis of clinical variables.A non-CNN model used human-assessed CT findings and clinical variables as input, which were analyzed with machine learning.For each type, several models were devised with different algorithm or input.
www.nature.com/scientificreports/[permit number: 325], Suzuka Kaisei Hospital institutional review board [permit number: 2020-05], and Mie University Hospital institutional review board [permit number: T2023-7].Because this was a retrospective study, separate informed patient consent was waived by the following institutional review boards: Mie Chuo Medical Center institutional review board [permit number: MCERB-202321], Matsusaka Chuo General Hospital institutional review board [permit number: 325], Suzuka Kaisei Hospital institutional review board [permit number: 2020-05], and Mie University Hospital institutional review board [permit number: T2023-7].All study protocols and procedures were conducted in accordance with the Declaration of Helsinki.This manuscript was prepared according to the standards for reporting of diagnostic accuracy (STARD) statement.

Table 1 .
Characteristics of the study population.Data are presented as n (%), mean ± standard deviation, or median (interquartile range).PT-INR = prothrombin time-international normalized ratio.

Table 2 .
). AUC was above 0.75 for CNN Model 2, Multimodal CNN Model 1 and 2, and Non-CNN Model 2, with the Multimodal CNN Model 2 having the highest AUC.Specificity was highest for Non-CNN Model 1 and 4. Accuracy was highest for Non-CNN Model 4.Multimodal CNN Model 2 showed the highest sensitivity and AUC of all models.Its model weights are stored in OSF in HDF5 format (419 MB, https:// osf.io/ wm768).Test results for predicting hematoma expansion in each model.Data are presented as value (95% confidence interval).AUC = area under the receiver operating characteristic curve.

Table 2 ,
except for Non-CNN Model 1 and 2. These models were excluded because they do not return continuous values as a prediction.